Goto

Collaborating Authors

 soft evidence


Unfolding AlphaFold's Bayesian Roots in Probability Kinematics

arXiv.org Artificial Intelligence

We present a novel theoretical interpretation of AlphaFold1 that reveals the potential of generalized Bayesian updating for probabilistic deep learning. The seminal breakthrough of AlphaFold1 in protein structure prediction by deep learning relied on a learned potential energy function, in contrast to the later end-to-end architectures of AlphaFold2 and AlphaFold3. While this potential was originally justified by referring to physical potentials of mean force (PMFs), we reinterpret AlphaFold1's potential as an instance of {\em probability kinematics} -- also known as {\em Jeffrey conditioning} -- a principled but under-recognised generalization of conventional Bayesian updating. Probability kinematics accommodates uncertain or {\em soft} evidence in the form of updated probabilities over a partition. This perspective reveals AlphaFold1's potential as a form of generalized Bayesian updating, rather than a thermodynamic potential. To confirm our probabilistic framework's scope and precision, we analyze a synthetic 2D model in which an angular random walk prior is updated with evidence on distances via probability kinematics, mirroring AlphaFold1's approach. This theoretical contribution connects AlphaFold1 to a broader class of well-justified Bayesian methods, allowing precise quantification, surpassing merely qualitative heuristics based on PMFs. Our contribution is theoretical: we replace AlphaFold1's heuristic analogy with a principled probabilistic framework, tested in a controlled synthetic setting where correctness can be assessed. More broadly, our results point to the considerable promise of probability kinematics for probabilistic deep learning, by allowing the formulation of complex models from a few simpler components.


Bayesian Neural Networks with Soft Evidence

arXiv.org Machine Learning

Bayes's rule deals with hard evidence, that is, we can calculate the probability of event $A$ occuring given that event $B$ has occurred. Soft evidence, on the other hand, involves a degree of uncertainty about whether event $B$ has actually occurred or not. Jeffrey's rule of conditioning provides a way to update beliefs in the case of soft evidence. We provide a framework to learn a probability distribution on the weights of a neural network trained using soft evidence by way of two simple algorithms for approximating Jeffrey conditionalization. We propose an experimental protocol for benchmarking these algorithms on empirical datasets, even when the data is purposely corrupted.


The Mathematics of Changing Oneโ€™s Mind, via Jeffreyโ€™s or via Pearlโ€™s Update Rule

Journal of Artificial Intelligence Research

Evidence in probabilistic reasoning may be โ€˜hardโ€™ or โ€˜softโ€™, that is, it may be of yes/no form, or it may involve a strength of belief, in the unit interval [0, 1]. Reasoning with soft, [0, 1]-valued evidence is important in many situations but may lead to different, confusing interpretations. This paper intends to bring more mathematical and conceptual clarity to the field by shifting the existing focus from specification of soft evidence to accomodation of soft evidence. There are two main approaches, known as Jeffreyโ€™s rule and Pearlโ€™s method; they give different outcomes on soft evidence. This paper argues that they can be understood as correction and as improvement. It describes these two approaches as different ways of updating with soft evidence, highlighting their differences, similarities and applications. This account is based on a novel channel-based approach to Bayesian probability. Proper understanding of these two update mechanisms is highly relevant for inference, decision tools and probabilistic programming languages.


On the Relative Expressiveness of Bayesian and Neural Networks

arXiv.org Artificial Intelligence

Shortly after the field was born in the 1950s, the focus turned to symbolic, model-based approaches, which were premised on the need to represent and reason with domain knowledge, and exemplified by the use of logic to represent such knowledge (McCarthy, 1959). In the 1980s, the focus turned to probabilistic, model-based approaches, as exemplified by Bayesian networks and probabilistic graphical models more generally (first major milestone) (Pearl, 1988). Starting in the 1990s, and as data became abundant, these probabilistic models provided the foundation for much of the research in machine learning, where models were learned either generatively or discriminatively from data. Recently, the field shifted its focus to numeric, functionbased approaches, as exemplified by neural networks, which are trained discriminatively using labeled data (deep learning, second major milestone) (Goodfellow et al., 2016; Hinton et al., 2006; Bengio et al., 2006; Ranzato et al., 2006; Rosenblatt, 1958; McCulloch & Pitts, 1943). Perhaps the biggest surprise with the second milestone is the extent to which certain tasks, associated with perception or limited forms of cognition, can be approximated using functions (i.e., neural networks) that are learned purely from labeled data, without the need for modeling or reasoning (Darwiche, 2018).


A Mathematical Account of Soft Evidence, and of Jeffrey's `destructive' versus Pearl's `constructive' updating

arXiv.org Artificial Intelligence

Evidence in probabilistic reasoning may be `hard' or `soft', that is, it may be of yes/no form, or it may involve a strength of belief, in the unit interval [0,1]. Reasoning with soft, $[0,1]$-valued evidence is important in many situations but may lead to different, confusing interpretations. This paper intends to bring more mathematical clarity to the field by shifting the existing focus from specification of soft evidence to accomodation of soft evidence. There are two main approaches, known as Jeffrey's rule and Pearl's method, which give different outcomes on soft evidence. This paper describes these two approaches as different ways of updating with soft evidence, highlighting their differences, similarities and applications. This account is based on a novel channel-based approach to Bayesian probability. Proper understanding of these two update mechanisms is highly relevant for inference, decision tools and probabilistic programming languages.


New Advances and Theoretical Insights into EDML

arXiv.org Machine Learning

EDML is a recently proposed algorithm for learning MAP parameters in Bayesian networks. In this paper, we present a number of new advances and insights on the EDML algorithm. First, we provide the multivalued extension of EDML, originally proposed for Bayesian networks over binary variables. Next, we identify a simplified characterization of EDML that further implies a simple fixed-point algorithm for the convex optimization problem that underlies it. This characterization further reveals a connection between EDML and EM: a fixed point of EDML is a fixed point of EM, and vice versa. We thus identify also a new characterization of EM fixed points, but in the semantics of EDML. Finally, we propose a hybrid EDML/EM algorithm that takes advantage of the improved empirical convergence behavior of EDML, while maintaining the monotonic improvement property of EM.


Exact Lifted Inference with Distinct Soft Evidence on Every Object

AAAI Conferences

The presence of non-symmetric evidence has been a barrier for the application of lifted inference since the evidence destroys the symmetry of the first-order probabilistic model. In the extreme case, if distinct soft evidence is obtained about each individual object in the domain then, often, all current exact lifted inference methods reduce to traditional inference at the ground level. However, it is of interest to ask whether the symmetry of the model itself before evidence is obtained can be exploited. We present new results showing that this is, in fact, possible. In particular, we show that both exact maximum a posteriori (MAP) and marginal inference can be lifted for the case of distinct soft evidence on a unary Markov Logic predicate. Our methods result in efficient procedures for MAP and marginal inference for a class of graphical models previously thought to be intractable.


Sensitivity Analysis in Bayesian Networks: From Single to Multiple Parameters

arXiv.org Artificial Intelligence

Previous work on sensitivity analysis in Bayesian networks has focused on single parameters, where the goal is to understand the sensitivity of queries to single parameter changes, and to identify single parameter changes that would enforce a certain query constraint. In this paper, we expand the work to multiple parameters which may be in the CPT of a single variable, or the CPTs of multiple variables. Not only do we identify the solution space of multiple parameter changes that would be needed to enforce a query constraint, but we also show how to find the optimal solution, that is, the one which disturbs the current probability distribution the least (with respect to a specific measure of disturbance). We characterize the computational complexity of our new techniques and discuss their applications to developing and debugging Bayesian networks, and to the problem of reasoning about the value (reliability) of new information.


EDML: A Method for Learning Parameters in Bayesian Networks

arXiv.org Artificial Intelligence

We propose a method called EDML for learning MAP parameters in binary Bayesian networks under incomplete data. The method assumes Beta priors and can be used to learn maximum likelihood parameters when the priors are uninformative. EDML exhibits interesting behaviors, especially when compared to EM. We introduce EDML, explain its origin, and study some of its properties both analytically and empirically.